On the Convergence of Decentralized Gradient Descent
نویسندگان
چکیده
Consider the consensus problem of minimizing f(x) = ∑n i=1 fi(x) where each fi is only known to one individual agent i belonging to a connected network of n agents. All the agents shall collaboratively solve this problem and obtain the solution via data exchanges only between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve data privacy. We study the decentralized gradient descent method in which each agent i updates its variable x(i), which is a local approximate to the unknown variable x, by taking the average of its neighbors’ followed by making a local negative gradient step −α∇fi(x(i)). The iteration is x(i)(k + 1)← ∑ j wijx(j)(k)− α∇fi(x(i)(k)), for each agent i, where the coefficients wij form a symmetric doubly stochastic matrix W = [wij ] ∈ Rn×n. As agent i does not communicate to non-neighbors, wij 6= 0 only if i = j or j is a neighbor of i. We analyze the convergence of this iteration and derive its rate, assuming that each fi is proper closed convex and lower bounded, ∇fi is Lipschitz continuous with constant Lfi , and stepsize α is fixed. Provided that α < O(1/Lh) where Lh = maxi{Lfi}, the objective error at the averaged solution, f( 1 n ∑ i x(i)(k))− f ∗ where f∗ is the optimal objective value, reduces at a speed of O(1/k) until it reaches O(α). If fi are (restricted) strongly convex, then both 1 n ∑ i x(i)(k) and each x(i)(k) converge to the global minimizer x ∗ at a linear rate until reaching an O(α)-neighborhood of x∗. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an O(α)-neighborhood of the true sparse signal. This analysis reveals how convergence depends on the stepsize, function convexity, and network spectrum.
منابع مشابه
A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates
This paper considers the problem of decentralized optimization with a composite objective containing smooth and non-smooth terms. To solve the problem, a proximal-gradient scheme is studied. Specifically, the smooth and nonsmooth terms are dealt with by gradient update and proximal update, respectively. The studied algorithm is closely related to a previous decentralized optimization algorithm,...
متن کاملD$^2$: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be unique and different. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are not too differ...
متن کاملA new Levenberg-Marquardt approach based on Conjugate gradient structure for solving absolute value equations
In this paper, we present a new approach for solving absolute value equation (AVE) whichuse Levenberg-Marquardt method with conjugate subgradient structure. In conjugate subgradientmethods the new direction obtain by combining steepest descent direction and the previous di-rection which may not lead to good numerical results. Therefore, we replace the steepest descentdir...
متن کاملAsynchronous Decentralized Parallel Stochastic Gradient Descent
Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...
متن کاملAn eigenvalue study on the sufficient descent property of a modified Polak-Ribière-Polyak conjugate gradient method
Based on an eigenvalue analysis, a new proof for the sufficient descent property of the modified Polak-Ribière-Polyak conjugate gradient method proposed by Yu et al. is presented.
متن کاملTwo Settings of the Dai-Liao Parameter Based on Modified Secant Equations
Following the setting of the Dai-Liao (DL) parameter in conjugate gradient (CG) methods, we introduce two new parameters based on the modified secant equation proposed by Li et al. (Comput. Optim. Appl. 202:523-539, 2007) with two approaches, which use an extended new conjugacy condition. The first is based on a modified descent three-term search direction, as the descent Hest...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM Journal on Optimization
دوره 26 شماره
صفحات -
تاریخ انتشار 2016